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GUIDANCE FOR INDUSTRY 1 



Statistical Approaches 
to Establishing Bioequivalence 



This guidance represents the Food and Drug Administration's current thinking on this topic. It 
does not create or confer any rights for or on any person and does not operate to bind FDA or the 
public. An alternative approach may be used if such approach satisfies the requirements of the 
applicable statutes and regulations. 



I. INTRODUCTION 

This guidance provides recommendations to sponsors and applicants who intend, either before or after 
approval, to use equivalence criteria in analyzing in vivo or in vitro bioequivalence (BE) studies for 
investigational new drug applications (INDs), new drug applications (NDAs), abbreviated new drug 
applications (ANDAs) and supplements to these applications. This guidance discusses three 
approaches for BE comparisons: average, population, and individual. The guidance focuses on how to 
use each approach once a specific approach has been chosen. This guidance replaces a prior FDA 
guidance entitled Statistical Procedures for Bioequivalence Studies Using a Standard Two- 
Treatment Crossover Design, which was issued in July 1992. 

II. BACKGROUND 
A. General 

Requirements for submitting bioavailability (BA) and BE data in NDAs, ANDAs, and 
supplements, the definitions of BA and BE, and the types of in vivo studies that are appropriate 
to measure BA and establish BE are set forth in 21 CFR part 320. This guidance provides 
recommendations on how to meet provisions of part 320 for all drug products. 

Defined as relative BA, BE involves comparison between a test (T) and reference (R) drug 
product, where T and R can vary, depending on the comparison to be performed (e.g., to-be- 
marketed dosage form versus clinical trial material, generic drug versus reference listed drug, 



This guidance has been prepared by the Population and Indiv idual Bioequivalence Working Group of the 
Biopharmaceutics Coordinating Committee in the Office of Pharmaceutical Science, Center for Drug Evaluation and 
Research (CDER) at the Food and Drug Administration (FDA). 
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drug product changed after approval versus drug product before the change). Although BA 
and BE are closely related, BE comparisons normally rely on (1) a criterion, (2) a confidence 
interval for the criterion, and (3) a predetermined BE limit. BE comparisons could also be used 
in certain pharmaceutical product line extensions, such as additional strengths, new dosage 
forms (e.g., changes from immediate release to extended release), and new routes of 
administration. In these settings, the approaches described in this guidance can be used to 
determine BE. The general approaches discussed in this guidance may also be useful when 
assessing pharmaceutical equivalence or performing equivalence comparisons in clinical 
pharmacology studies and other areas. 

B. Statistical 

In the July 1992 guidance on Statistical Procedures for Bioequivalence Studies Using a 
Standard Two-Treatment Crossover Design (the 1992 guidance), CDER recommended that 
a standard in vivo BE study design be based on the administration of either single or multiple 
doses of the T and R products to healthy subjects on separate occasions, with random 
assignment to the two possible sequences of drug product administration. The 1 992 guidance 
further recommended that statistical analysis for pharmacokinetic measures, such as area under 
the curve (AUC) and peak concentration (Cmax), be based on the two one-sided tests 
procedure to determine whether the average values for the pharmacokinetic measures 
determined after administration of the T and R products were comparable. This approach is 
termed average bioequivalence and involves the calculation of a 90% confidence interval for 
the ratio of the averages (population geometric means) of the measures for the T and R 
products. To establish BE, the calculated confidence interval should fall within a BE limit, 
usually 80-125% for the ratio of the product averages. 2 In addition to this general approach, 
the 1992 guidance provided specific recommendations for (1) logarithmic transformation of 
pharmacokinetic data, (2) methods to evaluate sequence effects, and (3) methods to evaluate 
outlier data. 

Although average BE is recommended for a comparison of BA measures in most BE studies, 
this guidance describes two new approaches, termed population and individual 
bioequivalence. These new approaches may be useful, in some instances, for analyzing 
in vitro and in vivo BE studies. 3 The average BE approach focuses only on the comparison of 
population averages of a BE measure of interest and not on the variances of the measure for the 



For a broad range of drugs, a BE limit of 80 to 1 25% for the ratio of the product averages has been adopted 
for use of an average BE criterion. Generally, the BE limit of 80 to 125% is based on a clinical judgment that a test 
product with BA measures outside this range should be denied market access. 

3 For additional recommendations on in vivo studies, see the FDA guidance for industry on Bioavailability 
and Bioequivalence Studies for Orally Administered Drug Products • General Considerations. Additional 
recommendations on in vitro studies will be provided in an FDA guidance for industry on Bioavailability and 
Bioequivalence Studies for Sasal Aerosols and Nasal Sprays for Local Action, when finalized. 
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T and R products. The average BE method does not assess a subject-by-formulation 
interaction variance, that is, the variation in the average T and R difference among individuals. 
In contrast, population and individual BE approaches include comparisons of both averages and 
variances of the measure. The population BE approach assesses total variability of the measure 
in the population. The individual BE approach assesses within-subject variability for the T and 
R products, as well as the subject-by-formulation interaction. 

in. STATISTICAL MODEL 

Statistical analyses of BE data are typically based on a statistical model for the logarithm of the BA 
measures (e.g., AUC and Cmax). The model is a mixed-effects or two-stage linear model. Each 
subject, j, theoretically provides a mean for the log-transformed BA measure for each formulation, u. Tj 
and u, Rj for the T and R formulations, respectively. The model assumes that these subject-specific 
means come from a distribution with population means u, T and u. R , and between-subject variances a BT 2 
and a BR 2 , respectively. The model allows for a correlation, p, between u, Tj and u. Rj . The subject-by- 
formulation interaction variance component (Schall and Luus 1993), a D 2 , is related to these parameters 
as follows: 

a D 2 = variance of (u- T j - Mrj) 

= (ctbt - o- B r) 2 + 2 (I-p)o-btctbr Equation 1 

For a given subject, the observed data for the log-transformed BA measure are assumed to be 
independent observations from distributions with means p. Tj and ju. Rj , and within-subject variances a W T 2 
and ct wr 2 . The total variances for each formulation are defined as the sum of the within- and between- 
subject components (i.e., ct T t 2 = o- WT 2 + a BT 2 and a TR 2 = a WR 2 + a BR 2 ). For analysis of crossover 
studies, the means are given additional structure by the inclusion of period and sequence effect terms. 

IV. STATISTICAL APPROACHES FOR BIOEQUTVALENCE 

The general structure of a BE criterion is that a function (0) of population measures should be 
demonstrated to be no greater than a specified value (9). Using the terminology of statistical hypothesis 
testing, this is accomplished by testing the hypothesis Ho: ©>9 versus H A : ©• 9 at a desired level of 
significance, often 5%. Rejection of the null hypothesis Ho (i.e., demonstrating that the estimate of © is 
statistically significantly less than 9) results in a conclusion of BE. The choice of© and 9 differs in 
average, population, and individual BE approaches. 

A general objective in assessing BE is to compare the log-transformed BA measure after administration 
of the T and R products. As detailed in Appendix A, population and individual approaches are based 
on the comparison of an expected squared distance between the T and R formulations to the expected 
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squared distance between two administrations of the R formulation. An acceptable T formulation is one 
where the T-R distance is not substantially greater than the R-R distance. In both population and 
individual BE approaches, this comparison appears as a comparison to the reference variance, which is 
referred to as scaling to the reference variability. 

Population and individual BE approaches, but not the average BE approach, allow two types of scaling: 
reference-scaling and constant-scaling. Reference-scaling means that the criterion used is scaled to the 
variability of the R product, which effectively widens the BE limit for more variable reference products. 
Although generally sufficient, use of reference-scaling alone could unnecessarily narrow the BE limit for 
drugs and/or drug products that have low variability but a wide therapeutic range. This guidance, 
therefore, recommends mixed-scaling for the population and individual BE approaches (section IV.B 
and C). With mixed scaling, the reference-scaled form of the criterion should be used if the reference 
product is highly variable; otherwise, the constant-scaled form should be used. 

A. Average Bioequivalence 

The following criterion is recommended for average BE: 

(Ht - u.r) 2 • 8 A 2 Equation 2 

where 

u. T = population average response of the log-transformed measure for the T 
formulation 

u-r = population average response of the log-transformed measure for the R 
formulation 

as defined in section III above. 
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This criterion is equivalent to: 



-9 A • (Pt - Pr) * * 6a Equation 3 

and, usually, 0 A = ln(l .25). 
Population Bioequivalence 

The following mixed-scaling approach is recommended for population BE (i.e., use the 
reference-scaled method if the estimate of a TR > a T0 and the constant-scaled method if the 
estimate of g T r • a-ro)- 

The recommended criteria are: 

Reference-Scaled: 

(p T - p R ) 2 + (a TT 2 - o-tr 2 ) 

— • 9 P Equation 4 

„ 2 

cttr 

or 

Constant-Scaled: 

(p T - p R ) 2 + (a TT 2 - o-tr 2 ) 

„ 2 
CTTO 



Equation 5 



= population average response of the log-transformed measure 

for the T formulation 
= population average response of the log-transformed measure 

for the R formulation 
= total variance (i.e., sum of within- and between-subject 

variances) of the T formulation 
= total variance (i.e., sum of within- and between-subject 

variances) of the R formulation 
= specified constant total variance 
= BE limit 
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Equations 4 and 5 represent an aggregate approach where a single criterion on the left-hand 
side of the equation encompasses two major components: (1) the difference between the T and 
R population averages (|n T - u. R ), and (2) the difference between the T and R total variances 
(a T T 2 - o-jr 2 ). This aggregate measure is scaled to the total variance of the R product or to a 
constant value (a T o 2 , a standard that relates to a limit for the total variance), whichever is 
greater. 

The specification of both ct T o and 0 P relies on the establishment of standards. The generation of 
these standards is discussed in Appendix A. When the population BE approach is used, in 
addition to meeting the BE limit based on confidence bounds, the point estimate of the 
geometric test/reference mean should fall within 80-125%. 

Individual Bioequivalence 

The following mixed-scaling approach is one approach for individual BE (i.e., use the reference- 
scaled method if the estimate of ct wr > ct W o, and the constant-scaled method if the estimate of 
ctwr * tfwo)- Also see section VII.D, Discontinuity, for further discussion. 

The recommended criteria are: 

Reference-Scaled: 

(ut - ur) 2 + a D 2 + (owt 2 - o-wr 2 ) 

• 9| Equation 6 

0"WR 

or 

Constant-Scaled: 

(^T - Ur) 2 + CT D 2 + (Owl 2 " O-WR 2 ) 

— — 9, Equation 7 

O"wo 

where: 

= population average response of the log-transformed measure 

for the T formulation 
ja R = population average response of the log-transformed measure 

for the R formulation 
a D 2 = subject-by-formulation interaction variance component 




within-subject variance of the T formulation 
within-subject variance of the R formulation 
specified constant within-subject variance 
BE limit 



Equations 6 and 7 represent an aggregate approach where a single criterion on the left-hand 
side of the equation encompasses three major components: (1) the difference between the T 
and R population averages (u. T - u.r), (2) subject-by-formulation interaction (a D 2 ), and (3) the 
difference between the T and R within-subject variances (a WT 2 - ct W r 2 ). This aggregate 
measure is scaled to the within-subject variance of the R product or to a constant value (a wo 2 , a 
standard that relates to a limit for the within-subject variance), whichever is greater. 

The specification of both a W o and 8) relies on the establishment of standards. The generation of 
these standards is discussed in Appendix A. When the individual BE approach is used, in 
addition to meeting the BE limit based on confidence bounds, the point estimate of the 
geometric test/reference mean ratio should fall within 80-125%. 

STUDY DESIGN 

A. Experimental Design 

/. Nonreplicated Designs 

A conventional nonreplicated design, such as the standard two-formulation, two-period, 
two-sequence crossover design, can be used to generate data where an average or 
population approach is chosen for BE comparisons. Under certain circumstances, 
parallel designs can also be used. 

2, Replicated Crossover Designs 

Replicated crossover designs can be used irrespective of which approach is selected to 
establish BE, although they are not necessary when an average or population approach 
is used. Replicated crossover designs are critical when an individual BE approach is 
used to allow estimation of within-subject variances for the T and R measures and the 
subject-by-formulation interaction variance component. The following four-period, 
two-sequence, two-formulation design is recommended for replicated BE studies (see 
Appendix B for further discussion of replicated crossover designs). 



Period 



12 3 4 

1 T R T R 

Sequence 

2 R T R T 

For this design, the same lots of the T and R formulations should be used for the 
replicated administration. Each period should be separated by an adequate washout 
period. 

Other replicated crossover designs are possible. For example, a three-period design, 
as shown below, could be used. 

Period 
12 3 

1 T R T 

Sequence 

2 R T R 



A greater number of subjects would be encouraged for the three-period design 
compared to the recommended four-period design to achieve the same statistical power 
to conclude BE (see Appendix C). 

B. Sample Size and Dropouts 

A minimum number of 12 evaluable subjects should be included in any BE study. When an 
average BE approach is selected using either nonreplicated or replicated designs, methods 
appropriate to the study design should be used to estimate sample sizes. The number of 
subjects for BE studies based on either the population or individual BE approach can be 
estimated by simulation if analytical approaches for estimation are not available. Further 
information on sample size is provided in Appendix C. 

Sponsors should enter a sufficient number of subjects in the study to allow for dropouts. 
Because replacement of subjects during the study could complicate the statistical model and 
analysis, dropouts generally should not be replaced. Sponsors who wish to replace dropouts 
during the study should indicate this intention in the protocol. The protocol should also state 
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whether samples from replacement subjects, if not used, will be assayed. If the dropout rate is 
high and sponsors wish to add more subjects, a modification of the statistical analysis may be 
recommended. Additional subjects should not be included after data analysis unless the trial 
was designed from the beginning as a sequential or group sequential design. 

VI. STATISTICAL ANALYSIS 

The following sections provide recommendations on statistical methodology for assessment of average, 
population, and individual BE. 

A. Logarithmic Transformation 

1. General Procedures 

This guidance recommends that BE measures (e.g., AUC and Cmax) be log- 
transformed using either common logarithms to the base 1 0 or natural logarithms (see 
Appendix D). The choice of common or natural logs should be consistent and should 
be stated in the study report. The limited sample size in a typical BE study precludes a 
reliable determination of the distribution of the data set. Sponsors and/or applicants are 
not encouraged to test for normality of error distribution after log-transformation, nor 
should they use normality of error distribution as a reason for carrying out the statistical 
analysis on the original scale. Justification should be provided if sponsors or applicants 
believe that their BE study data should be statistically analyzed on the original rather 
than on the log scale. 

2. Presentation of Data 

The drug concentration in biological fluid determined at each sampling time point should 
be furnished on the original scale for each subject participating in the study. The 
pharmacokinetic measures of systemic exposure should also be furnished on the original 
scale. The mean, standard deviation, and coefficient of variation for each variable 
should be computed and tabulated in the final report. 

In addition to the arithmetic mean and associated standard deviation (or coefficient of 
variation) for the T and R products, geometric means (antilog of the means of the logs) 
should be calculated for selected BE measures. To facilitate BE comparisons, the 
measures for each individual should be displayed in parallel for the formulations tested. 
In particular, for each BE measure the ratio of the individual geometric mean of the T 
product to the individual geometric mean of the R product should be tabulated side by 
side for each subject. The summary tables should indicate in which sequence each 
9 



subject received the product. 



B. Data Analysis 

1. Average Bioequivalence 

a. Overview 

Parametric (normal-theory) methods are recommended for the analysis of log- 
transformed BE measures. For average BE using the criterion stated in 
equations 2 or 3 (section III.A), the general approach is to construct a 90% 
confidence interval for the quantity u. t -|^r and to reach a conclusion of average 
BE if this confidence interval is contained in the interval [-0 A , 0 A ]. Due to the 
nature of normal-theory confidence intervals, this is equivalent to carrying out 
two one-sided tests of hypothesis at the 5% level of significance (Schuirmann 
1987). 

The 90% confidence interval for the difference in the means of the log- 
transformed data should be calculated using methods appropriate to the 
experimental design. The antilogs of the confidence limits obtained constitute 
the 90% confidence interval for the ratio of the geometric means between the T 
and R products. 

b. Nonreplicated Crossover Designs 

For nonreplicated crossover designs, this guidance recommends parametric 
(normal-theory) procedures to analyze log-transformed BA measures. General 
linear model procedures available in PROC GLM in SAS or equivalent 
software are preferred, although linear mixed-effects model procedures can also 
be indicated for analysis of nonreplicated crossover studies. 

For example, for a conventional two-treatment, two-period, two-sequence (2 x 
2) randomized crossover design, the statistical model typically includes factors 
accounting for the following sources of variation- sequence, subjects nested in 
sequences, period, and treatment. The Estimate statement in SAS PROC 
GLM, or equivalent statement in other software, should be used to obtain 
estimates for the adjusted differences between treatment means and the 
standard error associated with these differences. 

c. Replicated Crossover Designs 
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Linear mixed-effects model procedures, available in PROC MIXED in SAS or 
equivalent software, should be used for the analysis of replicated crossover 
studies for average BE. Appendix E includes an example of SAS program 
statements. 

d. Parallel Designs 

For parallel designs, the confidence interval for the difference of means in the 
log scale can be computed using the total between-subject variance. As in the 
analysis for replicated designs (section VI. B.l .b), equal variances should not 
be assumed. 

Population Bioequivalence 

a. Overview 

Analysis of BE data using the population approach (section IV.B) should focus 
first on estimation of the mean difference between the T and R for the log- 
transformed BA measure and estimation of the total variance for each of the 
two formulations. This can be done using relatively simple unbiased estimators 
such as the method of moments (MM) (Chinchilli 1996, and Chinchilli and 
Esinhart 1996). After the estimation of the mean difference and the variances 
has been completed, a 95% upper confidence bound for the population BE 
criterion can be obtained, or equivalently a 95% upper confidence bound for a 
linearized form of the population BE criterion can be obtained. Population BE 
should be considered to be established for a particular log-transformed BA 
measure if the 95% upper confidence bound for the criterion is less than or 
equal to the BE limit, 0 P , or equivalently if the 95% upper confidence bound for 
the linearized criterion is less than or equal to 0. 

To obtain the 95% upper confidence bound of the criterion, intervals based on 
validated approaches can be used. Validation approaches should be reviewed 
with appropriate staff in CDER. Appendix F includes an example of upper 
confidence bound determination using a population BE approach. 

b. Nonreplicated Crossover Designs 

For nonreplicated crossover studies, any available method (e.g., SAS PROC 
GLM or equivalent software) can be used to obtain an unbiased estimate of the 
mean difference in log-transformed BA measures between the T and R 
products. The total variance for each formulation should be estimated by the 



usual sample variance, computed separately in each sequence and then pooled 
across sequences. 

c. Replicated Crossover Designs 

For replicated crossover studies, the approach should be the same as for 
nonreplicated crossover designs, but care should be taken to obtain proper 
estimates of the total variances. One approach is to estimate the within- and 
between-subject components separately, as for individual BE (see section 
VI.B.3), and then sum them to obtain the total variance. The method for the 
upper confidence bound should be consistent with the method used for 
estimating the variances. 

d. Parallel Designs 

The estimate of the means and variances from parallel designs should be the 
same as for nonreplicated crossover designs. The method for the upper 
confidence bound should be modified to reflect independent rather than paired 
samples and to allow for unequal variances. 

3. Individual Bioequivalence 

Analysis of BE data using an individual BE approach (section IV.C) should focus on 
estimation of the mean difference between T and R for the log-transformed BA 
measure, the subject-by-formulation interaction variance, and the within-subject 
variance for each of the two formulations. For this purpose, we recommend the MM 
approach. 

To obtain the 95% upper confidence bound of a linearized form of the individual BE 
criterion, intervals based on validated approaches can be used. An example is 
described in Appendix G. After the estimation of the mean difference and the variances 
has been completed, a 95% upper confidence bound for the individual BE criterion can 
be obtained, or equivalently a 95% upper confidence bound for a linearized form of the 
individual BE criterion can be obtained. Individual BE should be considered to be 
established for a particular log-transformed BA measure if the 95% upper confidence 
bound for the criterion is less than or equal to the BE limit, 0j, or equivalently if the 95% 
upper confidence bound for the linearized criterion is less than or equal to 0. 

The restricted maximum likelihood (REML) method may be useful to estimate mean 
differences and variances when subjects with some missing data are included in the 
statistical analysis. A key distinction between the REML and MM methods relates to 
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differences in estimating variance terms and is further discussed in Appendix H. 
Sponsors considering alternative methods to REML or MM are encouraged to discuss 
their approaches with appropriate CDER review staff prior to submitting their 
applications. 

VII. MISCELLANEOUS ISSUES 

A. Studies in Multiple Groups 

If a crossover study is carried out in two or more groups of subjects (e.g., if for logistical 
reasons only a limited number of subjects can be studied at one time), the statistical model 
should be modified to reflect the multigroup nature of the study. In particular, the model should 
reflect the fact that the periods for the first group are different from the periods for the second 
group. This applies to all of the approaches (average, population, and individual BE) described 
in this guidance. 

If the study is carried out in two or more groups and those groups are studied at different clinical 
sites, or at the same site but greatly separated in time (months apart, for example), questions 
may arise as to whether the results from the several groups should be combined in a single 
analysis. Such cases should be discussed with the appropriate CDER review division. 

A sequential design, in which the decision to study a second group of subjects is based on the 
results from the first group, calls for different statistical methods and is outside the scope of this 
guidance. Those wishing to use a sequential design should consult the appropriate CDER 
review division. 

B. Carryover Effects 

Use of crossover designs for BE studies allows each subject to serve as his or her own control 
to improve the precision of the comparison. One of the assumptions underlying this principle is 
that carryover effects (also called residual effects) are either absent (the response to a 
formulation administered in a particular period of the design is unaffected by formulations 
administered in earlier periods) or equal for each formulation and preceding formulation. If 
carryover effects are present in a crossover study and are not equal, the usual crossover 
estimate of u. t -Hr could be biased. One limitation of a conventional two-formulation, two- 
period, two-sequence crossover design is that the only statistical test available for the presence 
of unequal carryover effects is the sequence test in the analysis of variance (ANOVA) for the 
crossover design. This is a between-subject test, which would be expected to have poor 
discriminating power in a typical BE study. Furthermore, if the possibility of unequal carryover 
effects cannot be ruled out, no unbiased estimate of u.t-ur based on within-subject 
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comparisons can be obtained with this design. 



For replicated crossover studies, a within-subject test for unequal carryover effects can be 
obtained under certain assumptions. Typically only first-order carryover effects are considered 
of concern (i.e., the carryover effects, if they occur, only affect the response to the formulation 
administered in the next period of the design). Under this assumption, consideration of 
carryover effects could be more complicated for replicated crossover studies than for 
nonreplicated studies. The carryover effect could depend not only on the formulation that 
preceded the current period, but also on the formulation that is administered in the current 
period. This is called a direct-by-carryover interaction. The need to consider more than just 
simple first-order carryover effects has been emphasized (Fleiss 1989). With a replicated 
crossover design, a within-subject estimate of u, T -u. R unbiased by general first-order carryover 
effects can be obtained, but such an estimate could be imprecise, reducing the power of the 
study to conclude BE. 

In most cases, for both replicated and nonreplicated crossover designs, the possibility of 
unequal carryover effects is considered unlikely in a BE study under the following circumstances: 

• It is a single-dose study. 

• The drug is not an endogenous entity. 

• More than an adequate washout period has been allowed between periods of the study 
and in the subsequent periods the predose biological matrix samples do not exhibit a 
detectable drug level in any of the subjects. 

The study meets all scientific criteria (e.g., it is based on an acceptable study protocol 
and it contains sufficient validated assay methodology). 

The possibility of unequal carryover effects can also be discounted for multiple-dose studies 
and/or studies in patients, provided that the drug is not an endogenous entity and the studies 
meet all scientific criteria as described above. Under all other circumstances, the sponsor or 
applicant could be asked to consider the possibility of unequal carryover effects, including a 
direct-by-carryover interaction. If there is evidence of carryover effects, sponsors should 
describe their proposed approach in the study protocol, including statistical tests for the 
presence of such effects and procedures to be followed. Sponsors who suspect that carryover 
effects might be an issue may wish to conduct a BE study with parallel designs. 

C. Outlier Considerations 

Outlier data in BE studies are defined as subject data for one or more BA measures that are 
14 



discordant with corresponding data for that subject and/or for the rest of the subjects in a study. 
Because BE studies are usually carried out as crossover studies, the most important type of 
subject outlier is the within-subject outlier, where one subject or a few subjects differ notably 
from the rest of the subjects with respect to a within-subject T-R comparison. The existence of 
a subject outlier with no protocol violations could indicate one of the following situations: 

1. Product Failure 

Product failure could occur, for example, when a subject exhibits an unusually high or 
low response to one or the other of the products because of a problem with the specific 
dosage unit administered. This could occur, for example, with a sustained and/or 
delayed-release dosage form exhibiting dose dumping or a dosage unit with a coating 
that inhibits dissolution. 

2. Subject-by-Formulation Interaction 

A subject-by-formulation interaction could occur when an individual is representative of 
subjects present in the general population in low numbers, for whom the relative BA of 
the two products is markedly different than for the majority of the population, and for 
whom the two products are not bioequivalent, even though they might be bioequivalent 
in the majority of the population. 

In the case of product failure, the unusual response could be present for either the T or R 
product. However, in the case of a subpopulation, even if the unusual response is observed on 
the R product, there could still be concern for lack of interchangeability of the two products. 
For these reasons, deletion of outlier values is generally discouraged, particularly for 
nonreplicated designs. With replicated crossover designs, the retest character of these designs 
should indicate whether to delete an outlier value or not. Sponsors or applicants with these 
types of data sets may wish to review how to handle outliers with appropriate review staff. 

D. Discontinuity 

The mixed-scaling approach has a discontinuity at the changeover point, o W o (individual BE 
criterion) or a T o (population BE criterion), from constant- to reference-scaling. For example, if 
the estimate of the within-subject standard deviation of the reference is just above the 
changeover point, the confidence interval will be wider than just below. In this context, the 
confidence interval could pass the predetermined BE limit if the estimate is just below the 
boundary and could fail if just above. This guidance recommends that sponsors applying the 
individual BE approach may use either reference-scaling or constant-scaling at either side of the 
changeover point. With this approach, the multiple testing inflates the type I error rate slightly, 
to approximately 6.5%, but only over a small interval of a W R (about 0. 1 8-0.20). 
15 
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APPENDIX A 



Standards 

The equations in section IV call for standards to be established (i.e., a T o and 9 P for assessment of 
population BE, a W o and 9] for individual BE). The recommended approach to establishing these 
standards is described below. 

A. a to and o wo 

As indicated in section IV, a general objective in assessing BE should be to compare the 
difference in the BA log-measure of interest after the administration of the T and R formulations, 
T-R, with the difference in the same log-metric after two administrations of the R formulation, 
R-R«. 

1 . Population Bioequi valence 

For population BE, the comparisons of interest should be expressed in terms of the ratio 
of the expected squared difference between T and R (administered to different 
individuals) and the expected squared difference between R and R» (administered to 
different individuals), as shown below. 



E(T - R) 2 = (ut - p. R ) 2 + cttt 2 + o T R 2 Equation 8 

E(R-R') 2 =2a TR 2 Equation 9 

E(T - R) 2 (ut - u*) 2 + o-tt 2 + a TR 2 

Equation 10 

E(R - R-) 2 2o- TR 2 



The population BE criterion in equation 4 (section IV.B.) is derived from equation 10, 
such that the criterion equals zero for two identical formulations. The square root of 
equation 10 yields the "population difference ratio" (PDR): 
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(Ht - u«) 2 + a TT 2 + Otr 2 

PDR = [ ]• 

2a TR 2 



Equation 11 



The PDR is the square root of the ratio of the expected squared T-R difference 
compared to the expected squared R-R' difference in the population. It should be 
noted that the PDR is monotonically related to the population BE criterion (PBC) 
described in equation 4 as follows: 

PDR = (PBC/2 + 1 )• Equation 12 

Sponsors or applicants wishing to use the population BE approach should contact the 
Agency for further information on a T o- 

2. Individual Bioequivalence 

For individual BE, the comparisons of interest should be expressed in terms of the ratio 
of the expected squared difference between T and R (administered to the same 
individual) and the expected squared difference between R and R» (two administrations 
of R to the same individual), as shown below. 

E(T - R) 2 = (u. T - u,r) 2 + cj d 2 + a W T 2 + owr 2 Equation 13 

E(R - R') 2 = 2a W R 2 Equation 14 

E(T - R) 2 (ut - Mr) 2 + a D 2 + a WT 2 + a WR 2 



Equation 15 

E(R - R-) 2 2owr 2 



The individual BE criterion in equation 6 (section IV.C.) is derived from equation 15, 
such that the criterion equals zero for two identical formulations. The square root of 
equation 1 5 is the individual difference ratio (IDR): 
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(jit - Hr) 2 + CT D 2 + a W T 2 + CTwr 2 

idr = [ ----- r 

2a WR 2 



Equation 16 



The IDR is the square root of the ratio of the expected squared T-R difference 
compared to the expected squared R-R' difference within an individual. The IDR is 
monotonically related to the individual BE criterion (IBC) described in equation 6 as 
follows: 

IDR = (IBC/2 + 1 )' Equation 17 

This guidance recommends that a W o = 0.2, based on the consideration of the maximum 
allowable IDR of 1.25. 4 

B. e P and 9, 

The determination of 9 P and 9i should be based on the consideration of average BE criterion 
and the addition of variance terms to the population and individual BE criterion, as expressed by 
the formula below. 

average BE limit + variance factor 

9= 

variance 

1. Population Bioequivalence 
(lnl.25) 2 +8 P 

9 P = Equation 18 

a T o 

The value of s P for population BE is guided by the consideration of the variance term 
(gtt 2 - o- T r 2 ) added to the average BE criterion. Sponsors or applicants wishing to use 
the population BE approach should contact the Agency for further information on e P 
and 9 P . 



The IDR upper bound of 1.25 is drawn from the currently used upper BE limit of 1.25 for the average BE 



2. Individual Bioequivalence 



(lnl.25) 2 + 6, 

9i= Equation 19 

Owo 2 

The value of ei for individual BE is guided by the consideration of the estimate of 
subject-by-formulation interaction (a D ) as well as the difference in within-subject 
variability (a W T 2 - gwr 2 ) added to the average BE criterion. The recommended 
allowance for the variance term (a W T 2 - o-\vr 2 ) is 0.02. In addition, this guidance 
recommends a a D 2 allowance of 0.03. The magnitude of a D is associated with the 
percentage of individuals whose average T to R ratios lie outside 0.8-1 .25. It is 
estimated that if ct d = 0.1356, -10% of the individuals would have their average ratios 
outside 0.8-1 .25, even if u- T - u R = 0. When a D = 0.1741, the probability is -20%. 

Accordingly, on the basis of consideration for both a D and variability (ctwt 2 - ctwr 2 ) in 
the criterion, this guidance recommends that e, = 0.05. 
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APPENDIX B 



Choice of Specific Replicated Crossover Designs 



Appendix B describes why FDA prefers replicated crossover designs with only two sequences, and 
why we recommend the specific designs described in section V A of this guidance. 



1. Reasons Unrelated to Carryover Effects 



Each unique combination of sequence and period in a replicated crossover design can be called a cell of 
the design. For example, the two-sequence, four-period design recommended in 
section V.A.I has 8 cells. The four-sequence, four-period design below has 16 cells. 

Period 

12 3 4 

1 T R R T 

2 R T T R 

Sequence 

3 T T R R 



4 R R T T 



The total number of degrees-of-freedom attributable to comparisons among the cells is just the number 
of cells minus one (unless there are cells with no observations). 

The fixed effects that are usually included in the statistical analysis are sequence, period, and treatment 
(i.e., formulation). The number of degrees-of-freedom attributable to each fixed effect is generally equal 
to the number of levels of the effect, minus one. Thus, in the case of the two-sequence, four-period 
design recommended in section V.A.I, there would be 2-1=1 degree-of-freedom due to sequence, 4- 
1=3 degrees-of-freedom due to period, and 2-1=1 degree-of-freedom due to treatment, for a total of 
1+3+1=5 degrees-of-freedom due to the three fixed effects. Because these 5 degrees-of-freedom do 
not account for all 7 degrees-of-freedom attributable to the eight cells of the design, the fixed effects 
model is not saturated. There could be some controversy as to whether a fixed effects model that 
accounts for more or all of the degrees-of-freedom due to cells (i.e., a more saturated fixed effects 
model) should be used. For example, an effect for sequence-by-treatment interaction might be included 
in addition to the three main effects — sequence, period, and treatment. Alternatively, a sequence-by- 
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period interaction effect might be included, which would fully saturate the fixed effects model. 



If the replicated crossover design has only two sequences, use of only the three main effects (sequence, 
period, and treatment) in the fixed effects model or use of a more saturated model makes little difference 
to the results of the analysis, provided there are no missing observations and the study is carried out in 
one group of subjects. The least squares estimate of ut-Ur will be the same for the main effects model 
and for the saturated model. Also, the method of moments (MM) estimators of the variance terms in 
the model used in some approaches to assessment of population and individual BE (see Appendix H), 
which represent within-sequence comparisons, are generally fully efficient regardless of whether the 
main effects model or the saturated model is used. 

If the replicated crossover design has more than two sequences, these advantages are no longer 
present. Main effects models will generally produce different estimates of u. t -u.r than saturated models 
(unless the number of subjects in each sequence is equal), and there is no well-accepted basis for 
choosing between these different estimates. Also, MM estimators of variance tenns will be fully efficient 
only for saturated models, while for main effects models fully efficient estimators would have to include 
some between-sequence components, complicating the analysis. Thus, use of designs with only two 
sequences minimizes or avoids certain ambiguities due to the method of estimating variances or due to 
specific choices of fixed effects to be included in the statistical model. 

2. Reasons Related to Carryover Effects 

One of the reasons to use the four-sequence, four-period design described above is that it is thought to 
be optimal if carryover effects are included in the model. Similarly, the two-sequence, three-period 
design 

Period 
12 3 

1 T R R 

Sequence 

2 R T T 

is thought to be optimal among three-period replicated crossover designs. Both of these designs are 
strongly balanced for carryover effects, meaning that each treatment is preceded by each other 
treatment and itself an equal number of times. 

With these designs, no efficiency is lost by including simple first-order carryover effects in the statistical 
model. However, if the possibility of carryover effects is to be considered in the statistical analysis of 
BE studies, the possibility of direct-by-carryover interaction should also be considered. If direct-by- 
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cany over interaction is present in the statistical model, these favored designs are no longer optimal. 
Indeed, the TRR/RTT design does not permit an unbiased within-subject estimate of u. t -u,r in the 
presence of general direct-by-carryover interaction. 

The issue of whether a purely main effects model or a more saturated model should be specified, as 
described in the previous section, also is affected by possible carryover effects. If carryover effects, 
including direct-by-carryover interaction, are included in the statistical model, these effects will be 
partially confounded with sequence-by-treatment interaction in four-sequence or six-sequence 
replicated crossover designs, but not in two-sequence designs. 

In the case of the four-period and three-period designs recommended in section V.A.1, the estimate of 
u. t -u.r, adjusted for first-order carryover effects including direct-by-carryover interaction, is as efficient 
or more efficient than for any other two-treatment replicated crossover designs. 

3. Two-Period Replicated Crossover Designs 

For the majority of drug products, two-period replicated crossover designs such as the Balaam design 
(which uses the sequences TR, RT, TT, and RR) should be avoided for individual BE because subjects 
in the TT or RR sequence do not provide any information on subject-by-formulation interaction. 
However, the Balaam design may be useful for particular drug products (e.g., a long half-life drug for 
which a two-period study would be feasible but a three- or more period study would not). 
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APPENDIX C 



Sample Size Determination 

Sample sizes for average BE should be obtained using published formulas. Sample sizes for population 
and individual BE should be based on simulated data. The simulations should be conducted using a 
default situation allowing the two formulations to vary as much as 5% in average BA with equal 
variances and certain magnitude of subject-by-formulation interaction. The study should have 80 or 
90% power to conclude BE between these two formulations. Sample size also depends on the 
magnitude of variability and the design of the study. Variance estimates to determine the number of 
subjects for a specific drug can be obtained from the biomedical literature and/or pilot studies. 

Tables 1-4 below give sample sizes for 80% and 90% power using the specified study design, given a 
selection of within-subject standard deviations (natural log scale), between-subject standard deviations 
(natural log scale), and subject-by-formulation interaction, as appropriate. 

Table 1 

Average Bioequivalence 
Estimated Numbers of Subjects 
A=0.05 







80% 


Power 


90% 


Power 




a n 


2P 


4P 


2P 


4P 


0.15 


0.01 


12 


6 


16 


8 




0.10 


14 


10 


18 


12 




0.15 


16 


12 


22 


16 


0.23 


0.01 


24 


12 


32 


16 




0.10 


26 


16 


36 


20 




0.15 


30 


18 


38 


24 


0.30 


0.01 


40 


20 


54 


28 




0.10 


42 


24 


56 


30 




0.15 


44 


26 


60 


34 


0.50 


0.01 


108 


54 


144 


72 




0.10 


110 


58 


148 


76 




0.15 


112 


60 


150 


80 



Note: 1. Results for two-period designs use method of Diletti et al. (Diletti 1991). 

2. Results for four-period designs use relative efficiency data of Liu (Liu 1995). 
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Table 2 



Population Bioequivalence 
Four-Period Design (RTRT/TRTR) 
Estimated Numbers of Subjects 
8 P =0.02, A=0.05 





CTbr-CTbt 


80% Power 


90% Power 


0.15 


0.15 


18 


22 




0.30 


24 


32 


0.23 


0.23 


22 


28 




0.46 


24 


32 


0.30 


0.30 


22 


28 




0.60 


26 


34 


0.50 


0.50 


22 


28 




1.00 


26 


34 



Note: Results for population BE are approximate from simulation studies 

(1,540 simulations for each parameter combination), assuming two-sequence, 
four-period trials with a balanced design across sequences. 
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Table 3 



Individual Bioequivalence 
Estimated Numbers of Subjects 
e,=0.05, A=0.05 







80% 


Power 


90% 


Power 


a WT - 




3P 


4P 


3P 


4P 


0.15 


0.01 


14 


10 


18 


12 




0.10 


18 


14 


24 


16 




0.15 


28 


22 


36 


26 


0.23 


0.01 


42 


22 


54 


30 




0.10 


56 


30 


74 


40 




0.15 


76 


42 


100 


56 


0.30 


0.01 


52 


28 


70 


36 




0.10 


60 


32 


82 


42 




0.15 


76 


42 


100 


56 


0.50 


0.01 


52 


28 


70 


36 




0.10 


60 


32 


82 


42 




0.15 


76 


42 


100 


56 



Note: Results for individual BE are approximate using simulations (5,000 simulations 

for each parameter combination). The designs used in simulations are RTR/TRT (3P) 
and RTRT/TRTR (4P) assuming two-sequence trials with a balanced design across 
sequences. 

While the above sample sizes assume equal within-subject standard deviations, simulation studies for 3- 
period and 4-period designs reveal that if A = 0 and a m 2 -<J m 2 = 0.05 , the sample sizes given will 
provide either 80% or 90% power for these studies. 

To maintain consistency with section V.C, which suggests a minimum of 12 subjects in all BE studies, 
the one case where n = 10 provides 80% power should be increased to n = 12. 
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Table 4 



Individual Bioequivalence 
Estimated Numbers of Subjects 
e,=0.05, A=0.10 
With Constraint on A (0.8 < exp(A) < 1.25) 



80% Power 



90% Power 
4P 



q WT ~ 

0.30 



0.01 
0.10 
0.15 



4P 



30 
36 
42_ 



40 
48 
56 



0.50 



0.01 
0.10 
0.15 



34 
36 
42 



46 
48 
56 



Note: Results for individual BE are approximate using simulations (5,000 simulations 

for each parameter combination). The designs used in simulations are RTRT/TRTR (4P), 
assuming two-sequence trials with a balanced design across sequences. When A=0.05, 
sample sizes remain the same as given in Table 3. This is because the studies are already 
powered for variance estimation and inference, and therefore, a constraint on the point 
estimate of A has little influence on the sample size for small values of A. 
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APPENDIX D 



Rationale for Logarithmic Transformation of Pharmacokinetic Data 



A. Clinical Rationale 

The FDA Generic Drugs Advisory Committee recommended in 1991 that the primary comparison of 
interest in a BE study is the ratio, rather than the difference, between average parameter data from the T 
and R formulations. Using logarithmic transformation, the general linear statistical model employed in 
the analysis of BE data allows inferences about the difference between the two means on the log scale, 
which can then be retransformed into inferences about the ratio of the two averages (means or medians) 
on the original scale. Logarithmic transformation thus achieves a general comparison based on the ratio 
rather than the differences. 

B. Pharmacokinetic Rationale 

Westlake observed that a multiplicative model is postulated for pharmacokinetic measures in BA/BE 
studies (i.e., AUC and Cmax, but not Tmax) (Westlake 1973 and 1988). Assuming that elimination of 
the drug is first-order and only occurs from the central compartment, the following equation holds after 
an extravascular route of administration: 

AUCo.. = FD/CL Equation 20 

= FD/(VK e ) Equation 21 

where F is the fraction absorbed, D is the administered dose, and FD is the amount of drug absorbed. 
CL is the clearance of a given subject that is the product of the apparent volume of distribution (V) and 
the elimination rate constant (K e ). 5 The use of AUC as a measure of the amount of drug absorbed 

5 Note that a more general equation can be written for any multicompartmental model as 

AUC 0 - = FD/V d0 A„ Equation 22 

where V d8 is the volume of distribution relating drug concentration in plasma or blood to the amount of drug in the 
body during the terminal exponential phase, and X„ is the terminal slope of the concentration-time curve. 
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involves a multiplicative term (CL) that might be regarded as a function of the subject. For this reason, 
Westlake contends that the subject effect is not additive if the data are analyzed on the original scale of 
measurement. 

Logarithmic transformation of the AUC data will bring the CL (VK e ) term into the following equation in 
an additive fashion: 

InAUCo.. = lnF + lnD-lnV-lnK e Equation 23 



Similar arguments were given for Cmax. The following equation applies for a drug exhibiting one 
compartmental characteristics: 

C m ax = (FD/V) x e" ke * Tmax Equation 24 

where again F, D and V are introduced into the model in a multiplicative manner. However, after 
logarithmic transformation, the equation becomes 

lnC max = lnF + lnD-lnV- KJ" max Equation 25 

Thus, log transformation of the Cmax data also results in the additive treatment of the V term. 



33 



APPENDIX E 



SAS Program Statements for Average BE Analysis of 
Replicated Crossover Studies 



The following illustrates an example of program statements to run the average BE analysis using 
PROC MIXED in SAS version 6.12, with SEQ, SUBJ, PER, and TRT identifying sequence, 
subject, period, and treatment variables, respectively, and Y denoting the response measure (e.g., 
log(AUC), log(Cmax)) being analyzed: 

PROC MIXED; 

CLASSES SEQ SUBJ PER TRT; 
MODEL Y = SEQ PER TRT/ DDFM=S ATTERTH ; 
RANDOM TRT/TYPE=FA0(2) SUB=SUBJ G; 
REPEATED/GRP=TRT SUB=SUBJ; 
ESTIMATE T vs. R TRT 1 -1/CL ALPHA=0.1; 

The Estimate statement assumes that the code for the T formulation precedes the code for the R 
formulation in sort order (this would be the case, for example, if T were coded as 1 and R were 
coded as 2). If the R code precedes the T code in sort order, the coefficients in the Estimate 
statement would be changed to -1 1 . 

In the Random statement, TYPE=FA0(2) could possibly be replaced by TYPE=CSH. This 
guidance recommends that TYPE=UN not be used, as it could result in an invalid (i.e., not non- 
negative definite) estimated covariance matrix. 

Additions and modifications to these statements can be made if the study is carried out in more than 
one group of subjects. 
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APPENDIX F 



Method for Statistical Test of Population Bioequivalence Criterion 
Four-Period Crossover Designs 



Appendix F describes a method for using the population BE criterion (see section IV.B, equations 4 
and 5). The procedure involves the computation of a test statistic that is either positive (does not 
conclude population BE) or negative (concludes population BE). 

Consider the following statistical model which assumes a four-period design with equal replication of 
T and R in each of 5 sequences with an assumption of no (or equal) carryover effects (equal 
carryovers go into the period effects) 

Y, lU = M +Y,u +8>, t +e uu 

where i = \,...s indicates sequence, j = 1 , . . . n, indicates subject within sequence i, k = R T 
indicates treatment, / =1, 2 indicates replicate on treatment k for subjects within sequence i. Y iJkl is 
the response of replicate / on treatment k for subject j in sequence i, y M represents the fixed 
effect of replicate / on treatment k in sequence /, 8 ljk is the random subject effect for subject j in 
sequence / on treatment k , and e jJkl is the random error for subject j within sequence i on 
replicate / of treatment k . The e ijkl 's are assumed to be mutually independent and identically 
distributed as 

6i jk /~N(0, owk 2 ) 



for 1 = 1,. . . s , j = 1, . . . n, k = R, T, and / = 1, 2. Also, the random subject effects 
8 (J = {[i R + 5 iJR , fa + S ijr ) are assumed to be mutually independent and distributed a 



5, ~ N 2 



Hr J'l pO Br O BR O m 2 



The following constraint is applied to the nuisance parameters to avoid overparameterization of the 
model for k=R T: 
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t tr--o 

This statistical model proposed by Chinchilli and Esinhart assumes s*p location parameters (where p 
is the number of periods) that can be partitioned into t treatment parameters and sp-t nuisance 
parameters (Chinchilli and Esinhart 1996). This produces a saturated model. The various nuisance 
parameters are estimated in this model, but the focus is on the parameters needed for population BE. 
In some designs, the sequence and period effects can be estimated through a reparametrization of 
the nuisance effects. 

This model definition can be extended to other crossover designs. 

Linearized Criteria (from section IV, B, equations 4 and 5): 

Reference-Scaled: 

• Constant-Scaled: 

rj, = Our - ii R ) 2 +(oy/ - a rR 2 ) - 6 P ■ a 2 T0 < 0 

Estimating the Linearized Criteria: 

The estimation of the linearized criteria depends on study designs. The remaining estimation and 
confidence interval procedures assume a four-period design with equal replication of T and R in each 
of s sequences. The reparametrizations are defined as: 
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for /' = \,...,s and j = 1, ...,«, , where 

Compute the formulation means pooling across sequences: 

li k =Y s Yj, k ., k=R, T and k = fl r -fi R 

where 




Compute the variances of U Tlj , U RIJ ,V TlJ , V RlJ , pooling across sequences, and denote these variance 
estimates by MU r , MU R , MV T , MV R , respectively. Specifically, 

MV T =±ii(V TiJ -V ri ) 2 

n u„ (=1 j*\ 

n, = n Ur = n UR = » Kr = h Vr & ]~ 5 

Then, the linearized criteria are estimated by: 
Reference-Scaled: 

77, =A + M/ r +0.5-MK r -(l + $)-[A/ir s +0.5-W Jt ] 
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• Constant-Scaled: 

77 2 = A +MU T + 0.5-MV r -Q)-[MU R + 0.5-MV R ]-d P -<J TO 

95% Upper Confidence Bounds for Criteria: 

The table below illustrates the construction of a (l -a) level upper confidence bound based on the 

two-sequence, four-period design, for the reference-scaled criterion, 77, . Use ct=0.05 for a 95% 
upper confidence bound. 



Hq= Confidence Bound 


E q = Point Estimate 


U q =(H q -E q ) 2 




E D =A 


u D 




MU r = E\ 


m 


H2 Jn-^E2 


0.5 MV r =E2 


U2 


H3rs Jns)-E3rs 


-(]+G p )MU R = E3rs 


U3rs 


(n-s)E4rs 


-(\ + 6 p y 0.5 -MV R = E4rs 


U4rs 



H n = ^ E q + ( ]T U q is the upper 95% confidence bound for 77, . Note n = ]T«, , where 5 is 

the number of sequences, n t is the number of subjects per sequence, and X 2 " ,« - > is from the 
cumulative distribution function of the chi-square distribution with n-s degrees of freedom, i.e. 
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Pr(#V.v < X 2 "-"-*) = a - The confidence bound for r/ 2 is computed similarly, adjusting the 
constants associated with the variance components where appropriate (in particular, the constant 
associated with MU R and MV R ). 



Confidence Bound 




1 1 =(V\ . P I 2 




E D = A 


u D 


(n-s).E\ 


MU T =E\ 


U\ 


ir (n-s)-E2 


0.5 -MV T =E2 


U2 



(n-s)E3cs 



(«- a) £4c5 



-0.5-W s = £4ci 



Using the mixed-scaling approach, to test for population BE, compute the 95% upper confidence 
bound of either the reference-scaled or constant-scaled linearized criterion. The selection of either 
reference-scaled or constant-scaled approach depends on the study estimate of total standard 
deviation of the reference product (estimated by [MU R +0.5 ■ MV R ]^ in the four-period design). If 
the study estimate of standard deviation is < <r ro , the constant-scaled criterion and its associated 
confidence interval should be computed. Otherwise, the reference-scaled criterion and its 
confidence interval should be computed. The procedure for computing each of the confidence 
bounds is described above. If the upper confidence bound for the appropriate criterion is negative 
or zero, conclude population BE. If the upper bound is positive, do not conclude population BE. 
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APPENDIX G 



Method for Statistical Test of Individual Bioequivalence Criterion 



Appendix G describes a method for using the individual BE criterion (see section IV.C, equations 6 
and 7). The procedure (Hyslop, Hsuan, and Holder 2000) involves the computation of a test 
statistic that is either positive (does not conclude individual BE) or negative (concludes individual 



Consider the following statistical model that assumes a four-period design with equal replication of T 
and R in each of s sequences with an assumption of no (or equal) carryover effects (equal 
carryovers go into the period effects) 



where i = \,...s indicates sequence, j = l,...n, indicates subject within sequence i, k = R, T 
indicates treatment, /=1, 2 indicates replicate on treatment k for subjects within sequence i. Y ljkl is the 
response of replicate / on treatment k for subject j in sequence /, y M represents the fixed effect 
of replicate / on treatment k in sequence / , 8 jjk is the random subject effect for subject j in 
sequence /' on treatment k , and e ijkl is the random error for subject j within sequence i on 
replicate / of treatment k . The e m 's are assumed to be mutually independent and identically 
distributed as 

e ijk , ~ N(0, owk 2 ) 

for / = \,...s, j = 1, , k= R, T, and /= 1, 2. Also, the random subject effects 
8 y = (jU fl + 8 m , ft + S ljr ) are assumed to be mutually independent and distributed as 



The following constraint is applied to the nuisance parameters to avoid overparameterization of the 
model for k = R T: 



BE). 



= H k +Y ikl +8 iJk + e, jkl 
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This statistical model proposed by Chinchilli and Esinhart assumes s*p location parameters (where p 
is the number of periods) that can be partitioned into t treatment parameters and sp-t nuisance 
parameters (Chinchilli and Esinhart, 1996). This produces a saturated model. The various nuisance 
parameters are estimated in this model, but the focus is on the parameters needed for individual BE. 
In some designs, the sequence and period effects can be estimated through a reparametrization of the 
nuisance effects. 

This model definition can be extended to other crossover designs. 
Linearized Criteria (from section IV. C, equations 6 and 7) : 

• Reference-Scaled: 

r\ =(fi T -fa J +a 2 D +(a WT 2 - qj,„ 2 )-6, -<% R <0 

• Constant-Scaled: 

t\ = (pj. - Hx) 2 + ol + (a m 2 - <r WR 2 )- 6, ■ cj= 0 < 0 

Estimating the Linearized Criteria: 

The estimation of the linearized criteria depends on study designs. The remaining estimation and 
confidence interval procedures assume a four-period design with equal replication of T and R in each 
of s sequences. The reparametrizations are defined as: 

t - v -V 

1 ij 1 ijT\ x ijT2 

Rfj = Y ijm — Y ljR2 
for / = !,.. .,s and j = !,...,«,, where 
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Compute the formulation means, and the variances of 7 (J , 7^ , and R tj , pooling across 
and denote these variance estimates by M,, M T , and M R , respectively, where 



n, 1=1 -4 /=i 



^=V=^z|:(?;-r) 2 



Then, the linearized criteria are estimated by: 

• Reference-Scaled: 

A A 2 

77, = A + M, + 0.5-A/ r -(1.5+ 0,)-M„ 

• Constant-Scaled: 

77 2 = A +M / + 0.5-M r -1.5 -M,-^ -c^ c 



and the subject-by-formulation interaction variance component can be estimated by: 



42 



95% Upper Confidence Bounds for Criteria: 

The table below illustrates the construction of a (l -a) level upper confidence bound based on the 

two-sequence, four-period design, for the reference-scaled criterion, r/, . Use a =0.05 for a 95% 
upper confidence bound. 



H q = Confidence Bound 


E q = Point Estimate 


U q =(H q - E q ) 2 




E D = A 




,, (n--s)-M, 

1 


E,=M, 


u, 


„ 0.5-(n-s)-M T 
" T= XI,-, 


E T =0.5-M r 


U r 


H _-(I.5+9,) (n- S ) M, 

X\ -a, n-.s 


E K = -(1.5+ 0,)-M R 





where n = , s is the number of sequences, and j£ 2 a.«-.« ' s fr om cumulative distribution 

iiinction of the chi-square distribution with n-s degrees of freedom, i.e. Pr(#V« < X 2 ».»-*) =a - 

Then, H n = ^E q +(Z^</) / ^ ' s tne u PP er 95% confidence bound for r/, . The confidence bound 

for 77 2 is computed similarly, adjusting the constants associated with the variance components where 
appropriate (in particular, the constant associated with M R ). 
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Hg= Confidence Bound E q = Point Estimate U q =(Hg- E q ) 2 



\/A 2 



H _(n-s)-M, 


E,=M, 


u, 








u 0.5-(n-s)-M r 


E T =0.5- M T 


U T 








u -{\.5)-{n-s)-M R 


E K =-(\.5)-M R 











Using the mixed-scaling approach, to test for individual BE, compute the 95% upper confidence 
bound of either the reference-scaled or constant-scaled linearized criterion. The selection of either 
reference-scaled or constant-scaled criterion depends on the study estimate of within-subject 
standard deviation of the reference product. If the study estimate of standard deviation is < <j WI} , the 
constant-scaled criterion and its associated confidence interval should be computed. Otherwise, the 
reference-scaled criterion and its confidence interval should be computed. The procedure for 
computing each of the confidence bounds is described above. If the upper confidence bound for the 
appropriate criterion is negative or zero, conclude individual BE. If the upper bound is positive, do 
not conclude individual BE. 

This guidance recommends that sponsors use either reference-scaling or constant-scaling at the 
changeover point (see section VII.D, Discontinuity). To test for individual BE, compute the 95% 
upper confidence bounds of both reference-scaled and constant-scaled linearized criteria. The 
procedure for computing these confidence bounds is described above. If the upper bound of either 
criterion is negative or zero (either//^ or ), conclude individual BE. If the upper bounds of 

both criteria are positive, do not conclude individual BE. 
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APPENDIX H 



Variance Estimation 



Relatively simple unbiased estimators, the method of moments (MM) or the restricted maximum 
likelihood (REML) method, can be used to estimate the mean and variance parameters in the 
individual BE approach. A key distinction between the REML and MM methods relates to 
differences in estimating variance terms. The REML method estimates each of the three variances, 
CT D 2 » CT WR 2 > a wt 2 . separately and then combines them in the individual BE criterion. The REML 
estimate of a D 2 is found from estimates of g B r 2 , o- B t 2 , and the correlation, p. The MM approach 
is to estimate the sum of the variance terms in the numerator of the criterion, a D 2 + o WT 2 - a WR 2 , 
and does not necessarily estimate each component separately. One consequence of this difference 
is that the MM estimator of o^ 2 is unbiased but could be negative. The REML approach can also 
lead to negative estimates, but if the covariance matrix of the random effects is forced to be a 
proper covariance matrix, the estimate of ap 2 can be made to be non-negative. This forced non- 
negativity has the effect of making the estimate positively biased and introduces a small amount of 
conservatism to the confidence bound. The REML method can be used in special cases (e.g., when 
substantial missing data are present). In addition, the MM approaches have not yet been adapted 
to models that allow assessment of carryover effects. 
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